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CONFIDENTIAL 



>CPN1006 86 protein-export membrane protein SecD 

VSS PILNVPLKNHASVSGKFTHREVSKLASDLKSGAMSFVPEVLSEETISSDLGKKQCTQGI I SACCGLAMLIVL 

MSVTYRFGGVIASGAVLL^LLIWAALQYI^DAP 

KGYTiCAFGAIFDS^TTVLASALLFFLDTGPIK^^ 

FVG I KH D F LRGC KKLW AVSG S VFLLGC V ALG FGAWNS VLGMD F KGG Y AFT FN PK E H G I S DVAQMRGKWH K.LQ EA 
GLSSRDFRIQTFGSSEKIKIYFSDKSFKLYXSRYE PLSXNXRSXAGVNVGLLSETGLDFSTETLNETQNFWSKVS 
SKLSKKMRYQATIGLLGALAIILLWSLRFEWQYAFSAVCALIHDLLATCAVLFIAHFFLKKIQIDLQAIGALMT 
\njGYSLNl^LIIFDRIREDRQAlsnjFTPMHVLVl^ SVFNFAFIMTIG 
ILLGTLSSLYIAPPLLL 



ACTUAL ENCODED SEQUENCE 



1 


MVSSPILNVP 


LKNHASVSGK 


FTHREVSKLA 


SDLKSGAMSF 


VPEVLSEETI 


51 


SSDLGKKQCT 


QCIISACCCL 


AMLIVLMSVY 


YRFGGVIASG 


AVLL.NLDLIW 


101 


AALQYLDAPL 


TLSGLAGIVL 


AMGMAVDANV 


LVFERIREEF 


LLSQSLKKSV 


151 


EKGYTKAFGA 


IFDSNLTTVL 


ASALLFFLDT 


GPIKGFALTL 


ILGIFSSMFT 


201 


ALFMTKFFFM LWMNKTQHTQ 


LHMMNKFVGI 


KHDFLRGCKK 


LWAVSGSVFL 


251 


LGCVALGFGA 


WNSVLiGMDFK 


GGYAFTFNPK 


EHGISDVAQM 


RGKWHKLQE 


301 


AGLSSRDFRI 


QTFGSSEKIK 


IYFSDKALSY 


TKQIRASLLK 


LTIMSWRYCG 


351 


IWRNRPRFL 


YGNSKRNAKF 


WSKVSSKLSK 


KMRYQATIGL 


LGALAIILLY 


401 


VSLRFEWQYA 


FSAVCALIHD 


LLATCAVLFI 


AHFFLKKIQI 


DLQAIGALMT 


451 


VLGYSLNNTL 


IIFDRIREDR 


QANLFTPMHV 


L.VNDALQKTF 


S RTV MTT ATT 


501 


LSVLLMLLFI 


GGSSVFNFAF 


IMTIGILLGT 


LSSLYIAPPL 


LLFMVRKENR 


551 


SIC 












CODING SEQUENCE 

THE PROTEIN IS ENCODED ON THE POSITIVE STRAND 
The ATG is presumably the start codon 
The TAA is presumably the stop codon 



1 


ATGGACTTCC 


GCATATTGTC 


AGGAGGGGAT 


CAGCGGCACT 


GCTAATGGAC 


51 


AATATTCTGC 


AAACCGTGGA 


TGGCGTATGG 


C TGTAGTG AT 


TGACGGTTAT 


101 


ATCGTCAGCA 


GCCCTATTTT 


AAACGTCCCA 


TTGAAAAATC 


ATGCCAGTGT 


151 


CTCAGGGAAA 


TTTACCCACC 


GTGAAGTGAG 


CAAACTCGCC 


TCAGATTTAA 


201 


AATCTGGAGC 


GATGTCTTTT 


GTTCC CGAGG 


TTCTCAGTGA 


AGAGACGATC 


251 


TCTTCTGATC 


TTGGGAAAAA 


ACAATGTACA 


CAAGGCATTA 


TCTCAGCATG 


301 


CTGTGGCTTG 


GCAATGCTTA 


TTGTTTTGAT 


GAGCGTATAT 


TATAGATTTG 


351 


GAGGCGTCAT 


CGCTTCGGGA 


GCTGTTCTTC 


TGAATCTTTT 


GCTTATCTGG 


401 


GCAGCTCTAC 


AGTATTTGGA 


TGCGCCACTC 


ACCTTGTCAG 


GACTCGCTGG 


451 


GATTGTTCTT 


GCTATGGGGA 


TGGCCGTAGA 


TGC AAATGTT 


CTTGTATTCG 


501 


AAAGAATCCG 


AGAGGAATTT 


TTATTGTCTC 


AAAGTCTTAA 


AAAATCTGTA 


551 


GAAAAAGGAT 


ATACCAAGGC 


TTTTGGAGCC 


ATTTTTGATT 


CTAACTTGAC 


601 


TACAGTATTG 


GCCTCAGCAC 


TTCTTTTCTT 


C C TAG AT AC A 


GGGCCTATTA 


651 


AAGGGTTTGC 


TTTGACATTG 


ATTTTAGGAA 


TTTTCTCTTC 


AATGTTT AC G 


701 


GCTCTTTTCA 


TGACTAAATT 


TTTCTTCATG 


C TGTGG ATG A 


ATAAGACCCA 


751 


AC AT AC AC AG 


TTGCATATGA 


TGAATAAGTT 


CGTGGGGATA 


AAGC ATG ATT 


801 


TCTTGAGAGG 


ATGCAAAAAA 


CTTTGGGCTG 


TTTCTGGAAG 


TGTTTTTCTT 


851 


TTAGGTTGCG 


TTGCTCTCGG 


GTTTGGAGCC 


TGG AATTC C G 


TTTTGGGAAT 


901 


GGATTTTAAA 


GGAGGGTATG 


CCTTTACCTT 


TAATCCAAAA 


GAGCATGGCA 


951 


TCAGCGATGT 


TGCTCAAATG 


C GTGGC AAAG 


frun/^rnn/^ tv m t\ tv 

TTGTGC AT AA 


TV f*^ m TV /"*• TV /""I TV T\ 

AC T AC AGG AA 


1001 


GCTGGTCTTT 


C TTC TAG AG A 


C TTC C GT ATT 


CAAACATTTG 


GATCTTCAGA 


1051 


AAAGATCAAA 


ATCTATTTTA 


GTGATAAAGC 


TTTAAGCTAT 


ACTAAGCAGA 


1101 


TACGAGCCTC 


TCTCCTAAAA 


TTAACGATCA 


TGAGCTGGCG 


TTATTGTGGG 


1151 


ATTGTTGTCA 


GAAACAGGCC 


TAGATTTCTC 


TACGGAAACT 


CTAAACGAAA 


1201 


CGCAAAATTT 


TGGTC AAAGG 


TAAGCAGCAA 


ACTATCGAAG 


AAAATGC GTT 


1251 


ATCAGGCGAC 


CATCGGGCTT 


TTAGGAGCTT 


TGGCAATCAT 


CTTGCTCTAT 


1301 


GTGAGTTTGC 


GCTTTGAATG 


GCAATATGCT 


TTCAGTGCCG 


TATGCGCTTT 


1351 


AATTC ATG AC 


CTTTTGGCTA 


CCTGTGCAGT 


C TTGTTTAT A 


GCACATTTCT 


1401 


TTTTGAAGAA 


AATTC AAATA 


GATTTGCAAG 


CCATTGGTGC 


TTTAATGACT 


1451 


GTATTGGGGT 


ATT C ATT AAA 


CAATACTTTG 


ATCATTTTTG 


ATCGTATTCG 



1501 TGAAGATCGC CAAGCGAACC 

1551 ATGCCCTTCA AAAG AC GTTC 

1601 CTATCAGTTT TGTTAATGCT 

1651 TTTTGCATTT ATTATGACCA 

1701 TTTATATTGC ACCACCTCTG 

1751 TCAAAAXMG T AC C GTT AAA 

1801 TCCTTTGGGA CTTTAGTCCC 

1851 AATTCAGATA ATGC 



TGTTTACCCC TATGCATGTT TTAGTTAATG 
AGCCGCACGG TAATGACAAC AGCT AC AACT 
TTTGTTTATA GGCGGCTCCT CTGTCTTTAA 
TAGGGATTCT TC TAG G AACT TTATCGTCTC 
TTGTTGTTTA TGGTCCGTAA AG AAAATC GC 
CTTAATCTAA CGTGTAGCAA TATAAAAATC 
AAAGGCCCCT GTGGTATTAA ATTTATGACA 



SEQUENCE AT.TGWMENT 



101 ATGGTC AGC AGC C C T ATTTT AAACGTC C C ATTG AAAAATC ATGCC AGTGT 150 

II I II III1IIIIIIIIIM lllllllllll llllllll Mill MINI 

1 MetValSerSerProIleLeuAsnValProLeuLysAsnHisAlaSerVa 17 
151 C TC AGGG AAATTT AC C C AC C GTG AAGTG AGC AAAC TC GC C TC AG ATTT AA 200 

II I II III II MM II IIIIIIIIIMIII I llllll II MINI 111 II 

18 lSerGlyLysPheThrHisArgGluValSerLysLeuAlaSerAspLeuL 3 4 
201 AATC TGG AGC G ATGTC TTTTGTTC C C G AGGTTC TC AGTG AAG AG AC G ATC 2 50 

M IMMMMMMMMMMMMMMMMMMMMMMMI 

35 ysSerGlyAlaMetSerPheValProGluValLeuSerGluGluThrlle 50 

2 51 TC TTC TG ATC TTGGG AAAAAAC AATGT AC AC AAGGC ATT ATC TC AGC ATG 300 

M I II III Mill I MUM IIIIIIMM MM Mill Mill I Mill 

51 SerSerAspLeuGlyLysLysGlnCysThrGlnGlyllelleSerAlaCy 67 

3 01 C TGTGGC TTGGC AATGC TT ATTGTTTTG ATG AGC GT AT ATT AT AG ATTTG 3 50 

1 1 ! I M I I II 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 M 1 1 1 

68 sCysGlyLeuAlaMetLeuIleValLeuMetSerValTyrTyrArgPheG 84 
3 51 GAGGCGTC ATCGCTTCGGGAGCTGTTCTTCTGAATCTTTTGCTTATCTGG 400 

M I 1 1 1 1 M I 1 1 1 II II 1 1 I II 1 1 I M 1 1 1 I M I I M I I I I 1 1 I I 1 1 1 1 1 

85 lyGlyVallleAlaSerGlyAlaValLeuLeuAsnLeuLeuLeuIleTrp 100 
401 GCAGCTCTACAGTATTTGGATGCGCCACTCACCTTGTCAGGACTCGCTGG 450 

M I Mill llllll II Ml I Mill Mill III I Ml MMMM II I II 

101 AlaAlaLeuGlnTyrLeuAspAlaProLeuThrLeuSerGlyLeuAlaGl 117 
451 G ATTGTTC TTGC T ATGGGG ATGGC C GTAG ATGC AAATGTTC TTGTATTC G 500 

II I lllllllllll II III Mill llllll IIIIIMII I Ml Mill I! 

118 ylleValLeuAlaMetGlyMetAlaValAspAlaAsnValLeuValPheG 134 
501 AAAGAATCCGAGAGGAATTTTTATTGTCTCAAAGTCTTAAAAAATCTGTA 550 

I It 1 1 1 1 1 I M 1 1 1 I It 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E I 1 1 I 1 1 

135 luArglleArgGluGluPheLeuLeuSerGlnSerLeuLysLysSerVal 150 
551 G AAAAAGGATATAC C AAGGC TTTTGGAGCC ATTTTTGATTCT AACTTG AC 600 

II MMMIIIMI II Ml Mill llllll llllll I IMM I II Ml II 

151 GluLysGlyTyrThrLysAlaPheGlyAlallePheAspSerAsnLeuTh 167 
601 TACAGTATTGGCCTCAGCACTTCTTTTCTTCCTAGATACAGGGCCTATTA 650 

II MM I Mill II Mill MMIMMIIMI I III III Mill Ml II 

168 rThrValLeuAlaSerAlaLeuLeuPhePheLeuAspThrGlyProIleL 184 
651 AAGGGTTTGC TTTG AC ATTG ATTTT AGGAATTTTC TC TTC AATGTTTACG 700 

II MM I I MM II Mill Mill llllll IMIMI III Mill Ml II , r 

185 ysGlyPheAlaLeuThrLeuIleLeuGlyllePheSerSerMetPheThr 200 
7 01 GCTCTTTTCATGACTAAATTTTTCTTCATGCTGTGGATGAATAAGACCCA 750 

llllll MMII II Mill IMMIIMMMI I II MM Ml II III M 

201 AlaLeuPheMetThrLysPhePhePheMetLeuTrpMetAsnLysThrGl 217 
751 ACATACACAGTTGCATATGATGAATAAGTTCGTGGGGATAAAGCATGATT 800 

MMII I I MMil Mill IMMMMMIM MIMM III M III II 

218 nHisThrGlnLeuHisMetMetAsnLysPheValGlylleLysHisAspP 234 
801 TC TTG AG AGG ATGC AAAAAAC TTTGGGCTGTTTC TGG AAGTGTTTTTC TT 850 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IMIMI n 

23 5 heT.euArgGlyCysT..ysT..ysI.euTrpAlaValSerGlySerValPheLeu 250 



851 TT AGGTTGC GTTGCTC TC GGGTTTGG AGCC TGG AATTC CGTTTTGGG AAT 900 

II II Mill IIMIIIIIII I Mill Mill MINI Mill III II III 

251 LeuGlyCysValAlaLeuGlyPheGlyAlaTrpAsnSerValLeuGlyMe 267 
901 GG ATTTT AAAGGAGGGT ATGC C TTT ACC TTT AATC C AAAAGAGC ATGGC A 950 

IMIIMM MM IMIMI I Mill Mill MIIMIMI I III II III 

268 tAspPheLysGlyGlyTyrAlaPheThrPheAsnProLysGluHisGlyl 284 
951 TCAGCGATGTTGCTCAAATGCGTGGCAAAGTTGTGCATAAACTACAGGAA 1000 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 

285 leSerAspValAlaGlnMetArgGlyLysValValHisLysLeuGlnGlu 300 
1001 GC TGGTCTTTC TTC T AG AG AC TTCC GTATTC AAAC ATTTGG ATC TTC AG A 1050 

II II III 1 1 M II M I II 1 1 I II I II MM Ml I M Ml II 1 1 M 1 1 II I 

3 01 AlaGlyLeuSerSerArgAspPheArglleGlnThrPheGlySerSerGl 317 
1051 AAAG ATC AAAATC T ATTTT AGTG AT AAAGC TTT AAGC T ATACT AAGC AG A 1100 

I M I M 1 1 1 If 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

318 uLysIleLysIleTyrPheSerAspLysAlaLeuSerTyrThrLysGlnl 3 34 
1101 TACGAGCCTCTCTCCTAAAATTAACGATCATGAGCTGGCGTTATTGTGGG 1150 

II II Mill II MM III II I lllllll II I II IIIIMM MUM Ml 

335 leArgAlaSerLeuLeuLysLeuThrlleMetSerTrpArgTyrCysGly 3 50 
1151 ATTGTTGTC AGAAAC AGGC CTAGATTTCTC TACGGAAACTCTAAACGAAA 1200 

I i 1 1 ] 9 1 1 1 1 1 ! I M 1 1 1 1 1 1 i 1 1 1 ! M 1 1 1 1 1 1 1 9 1 1 1 1 1 1 M I 

351 .11 eValVal ArgAsnArgProArgPheLeuTyrGlyAsnSerLy s Ar gAs 3 67 
1201 CGC AAAATTTTGGTC AAAGGTAAGC AGC AAACTATCGAAG AAAATGC GTT 1250 

I I 1 1 1 1 (I I 1 1 1 1 1 i M I ! I M I ( I 1 1 ]] II M 1 1 1 I !! 1 1 I I ! I 1 1 1 1 1 

3 68 nAlaLysPheTrpSerLysValSerSerLysLeuSerLysLysMetArgT 3 84 

12 51 ATC AGGC G AC C ATC GGGC TTTTAGGAGCTTTGGC AATC ATC TTGC TC TAT 13 00 

II II II I I I II II II I II M Ml I I Mill I II IMIMM I II I II I I I 

3 85 yrGlnAlaThrlleGlyLeuLeuGlyAlaLeuAlallelleLeuLeuTyr 400 
1301 GTGAGTTTGCGCTTTGAATGGCAATATGCTTTCAGTGCCGTATGCGCTTT 13 50 

II II II I 1 1 II II M Ml II III I Mill I I M II MUM MUM 1 1 1 

401 ValSerLeuArgPheGluTrpGlnTyrAlaPheSerAlaValCysAlaLe 417 

13 51 AATTCATGACCTTTTGGCTACCTGTGCAGTCTTGTTTATAGCACATTTCT 1400 

II M II I II MMM Ml M IM II I MM III Ml I MM I IIIIMM 

418 uIleHisAspLeuLeuAlaThrCysAlaValLeuPhelleAlaHisPheP 434 
1401 TTTTGAAG AAAATTC AAATAG ATTTGC AAGC C ATTGGTGC TTT AATG AC T 1450 

II II IMMI I MM III M III II III I I I MMIMMIMM II Ml 

435 heLeuLysLysIleGlnlleAspLeuGlnAlalleGlyAlaLeuMetThr 450 
1451 GTATTGGGGTATTCATTAAAC AAT ACTTTG ATC ATTTTTG ATC GTATTC G 1500 

II II III M II MM II I II III MIMM I II III Ml III Mill III 

451 ValLeuGlyTyrSerLeuAsnAsnThrLeuIlellePheAspArglleAr 467 
1501 TG AAG ATC GC C AAGC G AAC C TGTTT ACC C C T ATGC ATGTTTTAGTT AATG 1550 

II II III II MM II II I M III Mill II Ml IMMIMIM I M I II 

4 68 gGluAspArgGlnAlaAsnLeuPheThrProMetHisValLeuValAsnA 4 84 

1551 ATGCCCTTCAAAAGACGTTCAGCCGCACGGTAATGACAACAGCTACAACT 1600 

I I 1 1 1 t I 1 1 I 1 I 1 I I t 1 I I I I I I I I 1 I I I I I I I I I t I I t I I I I 1 1 1 I 1 1 I 

485 spAlaLeuGlnLysThrPheSerArgThrValMetThrThrAlaThrThr 500 

1601 CT ATC AGTTTTGTT AATGC TTTTGTTT AT AGGC GGC TC C TC TGTCTTT AA 1650 

II MM Mill II M II I Mill Mill I I IM III III MMIMIMI 

501 LeuSerValLeuLeuMetLeuLeuPhelleGlyGlySerSerValPheAs 517 



1651 TTTTGC ATTT ATT ATG ACC AT AGGG ATTC TTC TAGG AAC TTT ATC GTC TC 17 00 

III llllll MM Mill lllllll IIIIIIIIIIIIIIIIIIMM III 

518 nPheAlaPhelleMetThrlleGlylleLeuLeuGlyThrLeuSerSerL 534 
1701 TTT ATATTGC ACC AC CTC TGTTGTTGTTT ATGGTCCGTAAAG AAAATCGC 1750 

II MM III MM II III MUM I IMIMMMIIIIIIIIMM Ml 

535 euTyrIleAlaProProLeuLeuX.euPheMetValArgLysGluAsnArg 550 

1751 TCAAAA 1756 

MMII 

551 SerLys 552 



RESTRICTION MAP 



Hpyl78lII 
Acil Mnll |Sau3AI 

I 



Alwl 
Fnu4HI 
Taul 
Acil 
MspAlI 
Dpnl 
C jel 



TspRI 
BtSl I 



Cjel CjePI 
Sspl CviRI I 



ATGG AC TTCC GC AT ATTGTC AGG AGGGG ATC AGC GGCAC TGC T AATGG AC AATATTCTGC 

1 + h k 1 + 

T ACC TG AAGGC GTAT AAC AGTC CTC CC CT AGTCGC C GTG AC G ATTAC CTGTTATAAG AC G 



60 



Taal 
RsaJT 
BstDSI 
Cjel I 



Fokl 
Sfcl 
CviJI I 



Taal 

Cjel CjePI I 



CviJI 
BsmFI 
Fnu4HI 
Tsel I 



Bbvi 
Dral 
Msel | 
I 



Bed 
I 

AAAC CGTGG ATGGC GT ATGGCTGT AGTG ATTG AC GGTT ATATGGTC AGC AG CC C T ATTTT 
61 + + + + + + 120 

TTTGGCACCTACCGCATACCGACATCACTAACTGCCAATATACCAGTCGTCGGGATAAAA 



Apol 
Tsp509l 
BsrI BsmAI 
MslI Ddel 

Maell Nlallll TopRI BseMII Taal 

I Ml 

AAAC GTCC C ATTG AAAAATCATGCCAGTGTCTCAGGGAAATTT ACC C ACC GTG AAGTG AG 

121 + + + + + + 180 

TTTGC AC GC T AACTTTTT AGTAC GGTC AC AGAGTC C C TTT AAATGGGTGGC AC TTC AC TC 



Hpyl78lII 

BseMII ' 
Mnll 
Dral 



BsmAI 
BsmBI 
Earl 
Ddel 
Sthl32I 
Bpml 
BsaJI 
Hpyl78III 
Aval ' 



HaelV 
Hpyl88lX Hin4l 

Ddel I Msel | Mnll 
I I I 

C AAAC TC GC CTC AG ATTT AAAATC TGG AGC G ATGTC TTTTGTTC CC GAGGTTCTC AGTG A 

181 + + + + + + 240 

GTTTG AGC GG AGTC TAAATTTT AG ACC TC GC T AC AG AAAAC AAGGGC TCC AAGAGTC AC T 

Dpnl 
Earl 
Sau3AI 
Hpyl88IX 
MboII 
Dpnl 
BseMII 
Hin4I 
Sau3AI 
MboII 
TspRI I 



Rsal 
BsrGI 
BplI Tat I 

I I ... 

AGAGACGATCTCTTCTGATCTTGGGAAAAAACAATGTACACAAGGCATTATCTCAGCATG 

241 + + + + + + 

TCTCTGCTAGAGAAGACTAGAACCCTTTTTTGTTACATGTGTTCCGTAATAGAGTCGTAC 



Nlalll 
Nspl 
SphI 

Ddel Cac8l | 
I I 



300 



301 



Acelll 
Sthl32l 

BseMII Mnll Hin4I 

CviJI BsrDT Hgall BsaHI 

II Ml 

CTGTGGCTTGGCAATGCTTATTGTTTTGATC 

+ + + + H + 

GACACCGAACC GTT ACG AAT AAC AAAAC T AC TC GC ATAT AATATCT AAAC C TC CGC AGT A 



360 



Alul 
CviJI 
MboII 
MWOI 
Hpyl78IH| 



Acelll 
Bbvl 
Taal 
SfaNI 
Sfcl 



Alul 
CviJI 
Fnu4HI 
Tsel 
Cjel I 



Hinfl 
Tf il 
Hpyl88IX I 
I I 

CGCTTCGGGAGCTGTTCOTCTGAATCTTTTGCTTATCTGGGCAGCTCTACAGTATTTGGA 

36X + + + + + + 420 

GC GAAGC CC TC G AC AAG AAG AC TTAG AAAAC G AATAG AC C C GTC G AG ATGTC AT AAAC C T 



Hhal 
HphI I 
I 



Hinfl 
Hpyl78lII 
Plel 
Cjel 



Fokl 



Beef I 



CviJI 
Haelll 
Bed 
Eael 
Gdill 
SfaNI | 
I 



Mwol 



TGCGCCACTCACCTTGTCAGGACTCGCTGGGATTGTTCTTGCTATGGGGA'rGGCCGTAGA 

421 + + + + + + 480 

ACGCGGTGAGTGGAACAGTCCTGAGCGACCCTAACAAGAACGATACCCCTACCGGCATCT 



Hpyl8 8ix 
Mnll 

CviRI NspV Hinfl Apol 

Fokl I Taql Tfil Tsp509l BsmAI Msel 

ll III II 

TGC AAATGTTC TTGTATTC G AAAG AATCCG AG AGG AATTTTT ATTGTC TC AAAGTC TTAA 

481 + + + + + + 540 

ACGTTTACAAGAACATAAGCTTTCTTAGGCTCTCCTTAAAAATAACAGAGTTTCAGAATT 

CviJI CviJI 
BsaJI NlalV Hinfl 

Sfcl Styl Mwol I Tfil Sfcl 

I I I I I I 

AAAATCTGT AG AAAAAGG AT AT AC C AAGGC TTTTGG AGC C ATTTTTG ATTC T AAC TTG AC 

541 + + + + + + 600 

TTTT AG AC ATCTTTTTCCT AT ATGGTTCCGAAAACCTCGGTAAAAACT AAG ATTG AAC TG 



BbvCI 
BpulOl 
Ddel 
CviJI 
Hael 
Taal Haelll 



BseMII 
Mnll 
MboII I 



CviJI 
Haelll 
EcoO109I 
Bfal Sau96l 



BslI 



EcoNI 
Msel 



I 



TACAGTATTGGCCTCAGCACTTCTTTTCTTCCTAGATACAGGGCCTATTAAAGGGTTTGC 

601 + + + + + + 660 

ATGTC AT AAC C GG AGTC GTG AAG AAAAG AAGG ATC T ATGTC C C GG AT AATTTC CC AAAC G 



ApoT 
Tsp509l 

Mboll 
Beef I 

Apol Nlalll 
Mboll Hpyl78lII 
Tsp509I Earl CviJI Real 

I I II. 

TTTGACATTGATTTTAGGAATTTTCTCT 

65! + + + + + + 720 

AAAC TGTAAC TAAAATC CTT AAAAG AG AAGTT AC AAATGC CG AG AAAAGT ACTG ATTTAA 



Ndel 

Fokl CviRI 
Nlalll SimI I Taal I XmnI 

i i I il I 

TTTCTTCATGCTGTGGATGAATAAGACCCAACATACACAGTTGCATATGATCAATAAGTT 

721 + + + + + + 780 

AAAG AAGTAC G AC AC C T AC TT ATTCTGGGTTGT ATGTGTC AAC GT AT AC T AC TT ATTC AA 



Hpyl78lII 
Smll 
Mnl I 
SfaNl 

Nlalll I CviRI 

I I ... 
CGTGGGGATAAAGCATGATTTCTTGAGAGGATGCAAAAAACTTTGGGCTGTTTCTGGAAG 

781 + + + + + + 840 

GCACCCCTATTTCGTACTAAAGAACTCTCCTACGTTTTTTGAAACCCGACAAAGACCTTC 



Hpyl78lII 
CviJI 
Bce83l 
Fokl I 



Sthl32I 



Aval 



Apol 
EcoRI 
Tsp509I 
ScrFI 
CviJI 
EcoRII 
Nlaivl 



TGTTTTTC TTTT AGGTTGC GTTGC TCTC GGGTTTGG AGC C TGG AATTC C GTTTTGGG AAT 
841 + + + + + + 900 

AC AAAAAG AAAATC C AAC GC AAC GAG AGC C C AAAC C TC GG AC C TT AAGGC AAAAC C C TTA 



901 



Dral 
Msel 

Mill 1 1 Msei Nlalll SfaNl 

II III 
GGATTTTAAAGGAGGGTATGCCTTTACCTTTAATCCAAAAGAGCATGGCATCAGCGATGT 

+ + + + + + 

CCTAAAAT1"1'CCTCCCATACGGAAATGGAAATTAGGTTTTCTCGTACCGTAGTCGCTACA 



960 



Hpyl78III 
Mboll Bfal" 
Alul Xbal 
CviRI Sfcl CviJI BsmAI I 

II I I I 

TGCTCAAATGCGTGGCAAAGTTGTGCATAAACTACAGGAAGCTGGTCTTTCTTCTAGAGA 

y 6i + + + + + + 1020 

ACGAGTTTACGCACCGTTTCAACACGTATTTGATGTCCTTCGACCAGAAAGAAGATCTCT 



BsaBI 
Dpnl 
Sau3AI 
Alwl I 
Hpyl88lX| I 



Tthlllll 
Dpnl 
BstYI 
Sau3AI 
Eco57l MboII | 
I 



Alul 
CviJI 
Hindlll I 



C TTC CGT ATTC AAAC ATTTGGATCTTC AG AAAAGATC AAAATC T ATTTT AGTG AT AAAGC 

1021 + + + + + + 1080 

GAAGGCATAAGTTTGTAAACCTAGAAGTCTTTTCTAGT 

Cac8I 
RleAI 

Alul 
CviJI 
Nlalll 
Hpyl78lII 
Real 
BplI 



Alul 
CviJI 
Msel I Ddel 



Dpnl 
Sau3AI 
Msel 
Acelll 



Hin4I 
CviJI I 



Tsp509l 
Mnll I 



TTTAAGCTATACTAAGCAGATACGAGCCTCTCTCCTAAAATTAACGATCATGAGCTGGCG 

1081 + + + + + + 1140 

AAATTCGATATGATTCGTCTATGCTCGGAGAGAGGATTTTAATTGCTAGTACTCGACCGC 



Bfal 
CviJI 
Hael 
Haelll 
Hpyl88lX StuI 
I I 

TTATTGTGGGATTGTTGTCAGAAACAGGCCTAGATTTCTCTACGGAAACTCTAAACGAAA 

1141 + + + + + + 1200 

AATTACACCCTAACAACAGTCTTTGTCCGGATCTAAAGAGATGCCTTTGAGATTTGCTTT 



Bcgl 

Apol Fnu4HI Bbvl Sthl32I 

Tsp509I Tsel 1 TaqI I MboII I Bcgl 

I i I it III 

CGCAAAATTTTGGTCAAAGGTAAGCAGCAAACTATCGAAGAAAATGCGTTATCAGGCGAC 



1201 



-+ 1260 



GC GTTTT AAAAC C AGTTTC C ATTC GTCGTTTG AT AGC TTC TTTT AC GC AAT AGTC C GC TG 



Alul 
CviJI 



Hhal 



Bed CviJI 

ill i 

CATCGGGCTTTTAGGAGCTTTGGCAATCATCTTGCTCTATGTGAGTTTGCGCTTTGAATG 



1261 



-+ 1320 



GT AGCC C G AAAATC C TCG AAACC GTTAGTAGAAC GAG AT AC AC TC AAAC GCG AAAC TT AC 

Nlalll 
Hpyl78lII 
Tsp509I 
Msel 

TspRI Hhal CviRI 
Beef I Mwol ImwoI I Real CviJI Mwol 

I I I I I I 

GCAATATGCTTTCAGTGCCGTATGCGCTTTAATTCATGACCTTTTGGCTACCTGTGCAGT 

1321 + + + + + + 1380 

CGTTATACGAAAGTCACGGCATACGCGAAATTAAGTACTGGAAAACCGATGGACACGTCA 



CviJI 

Apol Cac8I 
Bsgl Tsp509l MboII CviRI I Mwol 

I II III 

CTTGTTT AT AGC AC ATTTC TTTTTG AAG AAAATTC AAAT AGATTTGC AAGC C ATTGGTGC 

1381 + + + + + + 1440 

G AAC AAATATC GTGTAAAG AAAAAC TTC TTTTAAGTTT ATC T AAAC GTTC GGTAAC C AC G 



Dpnl 
Bell 
Sau3AI 



Msel Taal Msel 

I I I 

TTTAATGACTGTATTGGGGTATTCATTAAACAATACTTTGATCATTTTTGATCGTATTCG 



Dpnl 

Sau3AI |Hpyl78lII 



1441 



-+ 1500 



AAATT AC TG AC ATAAC CC C ATAAGTAATTTGTTATG AAAC TAGT AAAAAC T AGC AT AAGC 

SfaNI 
Nlalll 
Nspl 
Nsil 
CviRI I 



Dpnl 
Sau3AI I 



Msel 



MboII 
I 

TGAAGATCGCCAAGCGAACCTGTTTACCCCTATGCATGTTTTAGTTAATGATGCCCTTCA 

1501 + 1 + + + 1560 

AC TTC T AGC GGTTC GC TTGG AC AAATGGGG AT AC GTAC AAAATC AATTACT ACGGG AAGT 



Mae 1 1 



Acil 
Fnu4HI 

Taul 
CviJI I 



MslI 
Taal I 



Alul 
CviJI 



Msel 



AAAGACGTTCAGCCGCACGGTAATGACAACAGCTACAACTCTATCAGTTTTGTTAATGCT 

1561 + + + + + + 1620 

TTTCTGCAAGTCGGCGTGCCATTACTGTTGTCGATGTTGAGATAGTCAAAACAATTACGA 



Nlaiv 
CviJI 
Fnu4HI 
Taul 
BseRI Acil | 
I 



Mnll 
Tsp509I 
Msel | 
I 



CjePI 



CviRI 



Hinf I 
MboII Tfil 



TTTGTTTATAGGC GGCTCCTCTGTCTTTAATTTTGC ATTTATTATGACC ATAGGGATTC T 

1621 + + +- + + + 1680 

AAAC AAATATC CGC C G AGG AG AC AG AAATT AAAAC GT AAAT AATACTGGT ATCC C T AAG A 



BsmAI Avail 
Bfal CjePI BsmBI CviRI Mnll Sau96I 

I III II 

TCTAGGAACTTTATCGTCTCTTTATATTGCACCACCTCTGTTGTTGTTTATGGTCCGTAA 

1681 + + + + + + 1740 

AG ATC C TTG AAAT AGC AG AG AAAT AT AACGTGGTGG AG AC AAC AAC AAAT AC C AGGC ATT 



Msel 
Taal 
Rsal | 



Msel 



I 



Af 1III 
Maell 



AG AAAATC GC TC AAAATAAGT ACC GTT AAAC TT AATC T AACGTGT AGC AAT ATAAAAATC 

1741 + + + + + + 1800 

TCTTTTAGCGAGTTTTATTCATGGCAATTTGAATTAGATTGCACATCGTTATATTTTTAG 



NlalV 
CviJI | 
Haelll | 



Eco0109I Apol Hpyl88lX 

Sau96I Tsp509I Apol 

BsmFI PshAI BsmFI I Msel I Tsp509l 

I III I I I 

TC C TTTGGG ACTTTAGTCC C AAAGGC CCCTGTGGTATTAAATTTATG AC AAATTC AG AT A 

1801 + + + + + + I860 

AGG AAAC C CTG AAATC AGGGTTTCC GGGG AC AC C ATAATTTAAAT ACTGTTTAAGTC TAT 



ATGC 

1861 1864 

TACG 

Enzymes that do cut : 



Acelll 


Acil 


Af 1III 


Alul 


Alwl 


Apol 


Aval 


Avail 


Bbvl 


BbvCI 


Bed 


Bce83I 


Beef I 


Bcgl 


BC1I 


Bfal 


BplI 


Bpml 


BpulOl 


BcaBI 


BcaHI 


BoaJI 


BoeMII 


BscRI 


Bsgl 


BslI 


BsmAI 


BsmBI 


BsmFI 


Bsrl 


BsrDI 


BsrGI 


BstDSI 


BstYI 


BtsI 


Cac8l 


Cjel 


CjePI 


CviJI 


CviRI 


Ddel 


Dpnl 


Dral 


Eael 


Earl 


Eco57I 


EcoNI 


Eco0109l 


EcoRI 


EcoRII 


Fnu4HI 


Fokl 


Gdill 


Hael 


Haelll 


HaelV 


Hgal 


Hhal 


Hin4l 


Hindlll 


Hinf I 


HphI 


Hpyl78lII 


Hpyl88lX 


Maell 


MboII 


Mnll 


Msel 


MslI 


MspAlI 


Mwol 


Ndel 


Nlalll 


NlalV 


Nsil 


Nspl 


NspV 


Plel 


PshAI 


Real 


RleAi 


Rsal 


Sau96l 


Sau3Al 


ScrFI 


Sf awi 


Sf cl 


Slmi 


Smll 


SphI 


Sspl 


Sthl32I 


StuI 


Sty I 


Taal 


TaqI 


TatI 


Taul 


Tf il 


Tsel 


Tsp509l 


TspRI 


Tthlllll 


Xbal 


Xmnl 

















Enzymes that do not cut : 



Aarl 


Aatll 


Ace I 


Acll 


Aflll 


Ahdl 


Alol 


AlwNI 


Apal 


ApdLI 


AscI 


Avrll 


Bael 


BamHl 


Ban I 


Banli 


Bbsl 


BciVI 


Bgll 


Bgll I 


Bmgl 


Bmrl 


Bpull02l 


Bsal 


BsaAI 


BsaWI 


BsaXI 


Bsbl 


BscGI 


BseSI 


BsiEI 


BsiHKAI 


BsmI 


Bsp24I 


Bspl286l 


BspEI 


BspGI 


BspLUllI 


BspMI 


BsrBI 


BsrFI 


BSSHII 


BSSSI 


BStAPI 


BStEII 


BStXI 


BStZ17I 


BSU3 6I 


Clal 


Drain 


DrdI 


DrdI I 


EagI 


Ecil 


Eco47lII 


EcoRV 


Faul 


Fsel 


Fspl 


Haell 


HgiEII 


Hindi 


Hpal 


Kpnl 


Maelll 


Mlul 


Mmel 


Msel 


Mspl 


Muni 


Narl 


Neil 


NCOI 


NgOAIV 


Nhel 


NOt I 


Nrul 


Pad 


Pf 111081 


Pf 1MI 


PinAI 


Pmel 


Pmll 


Ppil 


Psp5ll 


PstI 


Pvul 


PvuII 


RsrII 


SacI 


SacII 


Sail 


SanDI 


Sapl 


Sbf I 


Seal 


SexAl 


Sf il 


Sgf I 


SgrAI 


Smal 


SnaBI 


Spel 


Srf I 


Sse8647I 


Suni 


Swal 


Taqll 


Thai 


Tsp45I 


Tthllll 


Vspl 


Xcml 


Xhol 















>SW: SECF_JVfYCTU Q5063 5 mycobacterium tuberculosis, protein- export membrane 
protein secf. 11/97 
Length = 442 

Score = 90 (42.3 bits). Expect = 5.9e-16, Sum P(5) = 5.9e-16 
Identities = 30/118 (25%), Positives = 50/118 (42%) 

Query: 316 SEKIKIYFSDKALSYTKQIRASLLKLTIMSWRXCGIVVRNRPRFLYGNSKRNAKFWSKVS 375 

SE + ST QIR+ L + + P+G+AS VS 

Sbjct: 121 SEPQSWWGAGASATVQIRSETLTSDQTAKLRDALFEAFGPKGTDGQPSKQAISDSAVS 180 

Query: 376 S KLS KKMR YQ AT I GLLG ALAI I LL YVS LRFEWQ YAFS AVC AL I HDLL ATC AVLFI AH F 433 

+ + +A I L+ L ++ LY+++R+E SA+ A++ DL T V + F 

Sbjct: 181 ETWGGQ I TKKAV I AL W FLVL V AL Y I TVR YER YMT I S A I T AML FDLTVT AGVYS LVG F 238 



